神经网络(NNS)的重要性和复杂性正在增长。神经网络的性能(和能源效率)可以通过计算或内存资源约束。在内存阵列附近或内部放置计算的内存处理(PIM)范式是加速内存绑定的NNS的可行解决方案。但是,PIM体系结构的形式各不相同,其中不同的PIM方法导致不同的权衡。我们的目标是分析基于NN的性能和能源效率的基于DRAM的PIM架构。为此,我们分析了三个最先进的PIM架构:(1)UPMEM,将处理器和DRAM阵列集成到一个2D芯片中; (2)Mensa,是针对边缘设备量身定制的基于3D堆栈的PIM架构; (3)Simdram,它使用DRAM的模拟原理来执行位序列操作。我们的分析表明,PIM极大地受益于内存的NNS:(1)UPMEM在GPU需要内存过度按要求的通用矩阵 - 矢量乘数内核时提供23x高端GPU的性能; (2)Mensa在Google Edge TPU上提高了3.0倍和3.1倍的能源效率和吞吐量,用于24个Google Edge NN型号; (3)SIMDRAM在三个二进制NNS中以16.7倍/1.4倍的速度优于CPU/GPU。我们得出的结论是,由于固有的建筑设计选择,NN模型的理想PIM体系结构取决于模型的独特属性。
translated by 谷歌翻译
训练机学习(ML)算法是一个计算密集型过程,由于反复访问大型培训数据集,经常会陷入内存。结果,以处理器为中心的系统(例如CPU,GPU)遭受了内存单元和处理单元之间的昂贵数据移动,这会消耗大量的能量和执行周期。以内存为中心的计算系统,即具有内存(PIM)功能,可以减轻此数据运动瓶颈。我们的目标是了解现代通用PIM体系结构加速ML培训的潜力。为此,我们(1)在现实世界通用PIM体系结构上实现了几种代表性的经典ML算法(即线性回归,逻辑回归,决策树,K-均值聚类),(2)严格评估并表征它们在准确性,性能和缩放方面以及(3)与CPU和GPU上的对应物实现相比。我们对具有2500多个PIM核心的真实内存计算系统的评估表明,当PIM硬件在必要的操作和数据类型上,通用PIM架构可以极大地加速内存的ML工作负载。例如,我们对决策树的PIM实施比8核Intel Xeon上的最先进的CPU版本$ 27 \ times $ $,并且比最先进的GPU快$ 1.34 \ times $ $ NVIDIA A100上的版本。我们在PIM上的K-Means聚类分别为$ 2.8 \ times $和$ 3.2 \ times $ $,分别是最先进的CPU和GPU版本。据我们所知,我们的工作是第一个评估现实世界中PIM架构的ML培训的工作。我们以关键的观察,外卖和建议结束,可以激发ML工作负载的用户,PIM架构的程序员以及未来以内存计算系统的硬件设计师和架构师。
translated by 谷歌翻译
训练机学习算法是一个计算密集型过程,由于反复访问大型培训数据集,因此经常会限制内存。结果,以处理器为中心的系统(例如CPU,GPU)遭受了内存单元和处理单元之间的昂贵数据移动,这会消耗大量的能量和执行周期。以内存为中心的计算系统,即具有内存处理(PIM)功能的计算系统,可以减轻此数据运动瓶颈。我们的目标是了解现代通用PIM体系结构加速机器学习培训的潜力。为此,我们(1)将几种代表性的经典机器学习算法(即线性回归,逻辑回归,决策树,K-均值聚类)上实现在现实世界通用PIM架构上(2)以术语来表征它们与CPU和GPU上的同行实现相比,(3)将其准确性,性能和缩放率进行比较。我们对具有2500多个PIM核心的内存计算系统进行的实验评估表明,当PIM硬件在必要的操作和数据类型上,通用PIM体系结构可以极大地加速记忆的机器学习工作负载。据我们所知,我们的工作是第一个评估现实世界通用PIM体系结构的机器学习算法培训的工作。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
我们重新访问重尾损坏的最小二乘线性回归,假设最多损坏了$ n $ n $ n $ sized的标签 - 功能样本,最多是$ \ epsilon n $ nutialary Outliers。我们希望估计给定标签 - 功能对$(y,x)$满足$ y = \ y = \ langle x,b^*\ rangle+xi $的标签 - 功能对$(y,x)$的样本给定$ p $ -dimensional参数$ b^*$ - 尾$(x,\ xi)$。我们只假设$ x $ is $ l^4-l^2 $超债券与常数$ l> 0 $,并具有协方差矩阵$ \ sigma $,最低eigenvalue $ 1/\ mu^2> 0 $和有限条件号$ \ \ \ \ \ \ \ \ kappa> 0 $。只要$ \ xi x $具有有限的协方差矩阵$ \ xi $,噪声$ \ xi $可以任意取决于$ x $,而非对称性。我们提出了一个基于功率方法的近乎最佳的计算估计器,假设对$(\ sigma,\ xi)$也不了解$ \ xi $的运算符规范。如果概率至少$ 1- \ delta $,我们提出的估计器达到了统计率$ \ mu^2 \ vert \ xi \ xi \ vert^{1/2}(\ frac {p} {n} {n}+\ frac {\ log(\ log(\ log( 1/\ delta)}} {n}+\ epsilon)^{1/2} $ and beckdown-point $ \ epsilon \ epsilon \ sillesim \ frac {1} {l^4 \ kappa^2} $ \ ell_2 $ - norm,假设最小最小样本大小$ l^4 \ kappa^2(p \ log p + p + \ log(1/\ delta))\ sillsim n $,最多为log fix因数。据我们所知,这是同时满足所有提到的所有属性的第一个计算障碍算法。我们的估计器基于两阶段的乘量重量更新算法。第一阶段估计了(未知)预先条件的内部产品$ \ langle \ sigma(\ cdot),\ cdot \ rangle $。第二阶段估计下降方向$ \ sigma \ hat v $相对于(已知的)内部产品$ \ langle \ cdot,\ cdot \ rangle $,而无需了解或估计$ \ sigma $。
translated by 谷歌翻译
本文基于Loeffler离散余弦变换(DCT)算法引入了矩阵参数化方法。结果,提出了一类新的八点DCT近似值,能够统一文献中几个八点DCT近似的数学形式主义。帕累托效率的DCT近似是通过多准则优化获得的,其中考虑了计算复杂性,接近性和编码性能。有效的近似及其缩放的16和32点版本嵌入了图像和视频编码器中,包括类似JPEG的编解码器以及H.264/AVC和H.265/HEVC标准。将结果与未修饰的标准编解码器进行比较。在Xilinx VLX240T FPGA上映射并实现了有效的近似值,并评估了面积,速度和功耗。
translated by 谷歌翻译
This paper presents a machine learning approach to multidimensional item response theory (MIRT), a class of latent factor models that can be used to model and predict student performance from observed assessment data. Inspired by collaborative filtering, we define a general class of models that includes many MIRT models. We discuss the use of penalized joint maximum likelihood (JML) to estimate individual models and cross-validation to select the best performing model. This model evaluation process can be optimized using batching techniques, such that even sparse large-scale data can be analyzed efficiently. We illustrate our approach with simulated and real data, including an example from a massive open online course (MOOC). The high-dimensional model fit to this large and sparse dataset does not lend itself well to traditional methods of factor interpretation. By analogy to recommender-system applications, we propose an alternative "validation" of the factor model, using auxiliary information about the popularity of items consulted during an open-book exam in the course.
translated by 谷歌翻译
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
translated by 谷歌翻译
Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.
translated by 谷歌翻译
Using Structural Health Monitoring (SHM) systems with extensive sensing arrangements on every civil structure can be costly and impractical. Various concepts have been introduced to alleviate such difficulties, such as Population-based SHM (PBSHM). Nevertheless, the studies presented in the literature do not adequately address the challenge of accessing the information on different structural states (conditions) of dissimilar civil structures. The study herein introduces a novel framework named Structural State Translation (SST), which aims to estimate the response data of different civil structures based on the information obtained from a dissimilar structure. SST can be defined as Translating a state of one civil structure to another state after discovering and learning the domain-invariant representation in the source domains of a dissimilar civil structure. SST employs a Domain-Generalized Cycle-Generative (DGCG) model to learn the domain-invariant representation in the acceleration datasets obtained from a numeric bridge structure that is in two different structural conditions. In other words, the model is tested on three dissimilar numeric bridge models to translate their structural conditions. The evaluation results of SST via Mean Magnitude-Squared Coherence (MMSC) and modal identifiers showed that the translated bridge states (synthetic states) are significantly similar to the real ones. As such, the minimum and maximum average MMSC values of real and translated bridge states are 91.2% and 97.1%, the minimum and the maximum difference in natural frequencies are 5.71% and 0%, and the minimum and maximum Modal Assurance Criterion (MAC) values are 0.998 and 0.870. This study is critical for data scarcity and PBSHM, as it demonstrates that it is possible to obtain data from structures while the structure is actually in a different condition or state.
translated by 谷歌翻译